Skip to content

Fix VMM handle leaks in virtual memory paths#2235

Open
fallintoplace wants to merge 2 commits into
NVIDIA:mainfrom
fallintoplace:fix-vmm-cumemrelease
Open

Fix VMM handle leaks in virtual memory paths#2235
fallintoplace wants to merge 2 commits into
NVIDIA:mainfrom
fallintoplace:fix-vmm-cumemrelease

Conversation

@fallintoplace

@fallintoplace fallintoplace commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

This patch fixes the VMM handle lifecycle by moving cuMemRelease() calls inside the transaction boundary so release failures can still trigger rollback cleanup.

Changes

  • In allocate(), create a rollback-aware release closure and perform cuMemRelease(handle) before trans.commit(); callback is conditional to avoid double release.
  • In _grow_allocation_fast_path(), release new_handle before trans.commit() and guard rollback callback to only release if not already released.
  • In _grow_allocation_slow_path(), keep old_handle pinned until remap-dependent rollback logic is done, then release both new_handle and old_handle before trans.commit() through guarded callbacks.
  • Add regression tests that force cuMemRelease() failures for allocate/fast/slow paths and assert rollback cleanup callbacks still execute.

Validation

I attempted to run the targeted pytest selection in this environment, but pytest is not installed in the available interpreter.

Notes

This preserves rollback behavior while addressing the failure-after-commit edge described in PR review (release failures now remain within rollback coverage).

@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.core Everything related to the cuda.core module label Jun 17, 2026
@fallintoplace fallintoplace force-pushed the fix-vmm-cumemrelease branch from d4b0208 to 23647d8 Compare June 17, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant